Information Based Diagnostic for Genetic Variance Parameter Estimation in Multi-Environment Trials

Plant breeding programs evaluate varieties in series of field trials across years and locations, referred to as multi-environment trials (METs). These are an essential part of variety evaluation with the key aim of the statistical analysis of these datasets to accurately estimate the variety by environment (VE) effects. It has previously been thought that the number of varieties in common between environments, referred to as “variety connectivity,” was a key driver of the reliability of genetic variance parameter estimation and that this in turn affected the reliability of predictions of VE effects. In this paper we have provided the link between the objectives of this work and those in model-based experimental design. We propose the use of the D-optimality criterion as a diagnostic to capture the information available for the residual maximum likelihood (REML) estimation of the genetic variance parameters. We demonstrate the methods for a dataset with pedigree information as well as evaluating the performance of the diagnostic using two simulation studies. This measure is shown to provide a superior diagnostic to the traditional connectivity type measure in the sense of better forecasting the uncertainty of genetic variance parameter estimates.


INTRODUCTION
The objective of plant breeding is to breed superior plant varieties for various traits of economic importance. Selection of superior varieties is a result of the data analysis from a series of plant variety trials at a number of locations and possibly over several years, which are known as Multi-Environment Trials (METs). The breeding process is a progressive system involving the selection of superior varieties for further rounds of testing, removal of poor performing varieties and the inclusion of new varieties. This leads to datasets with varying levels of balance with respect to the number of varieties in common between environments, a measure known as "varietal connectivity." Unbalanced datasets are common when the MET spans multiple breeding stages and seasons.
Given that most MET datasets are unbalanced and exhibit variance heterogeneity (Cullis et al., 2000;Chapman et al., 2003), it is critical that appropriate statistical methodology is used for analysis. Various authors have recommended the use of factor analytic (Smith et al., 2001b) linear mixed models (FALMM) to model the variety effects in different environments (VE effects), with a one-stage approach referred to as the gold standard of analysis (Gogel et al., 2018). The FALMM approach has been shown to provide a parsimonious and computationally stable approximation to the fully saturated unstructured model (Kelly et al., 2007). It is used in many plant breeding programs within Australia and overseas due to its flexibility and ability to handle unbalanced datasets. Additionally, it provides reliable predictions of the VE effects which can then be summarised in an informative manner using the selection tools of Smith and Cullis (2018) and Smith et al. (2021b).
Not only is it important to use an appropriate method of analysis for MET data, it is also crucial to construct a suitable dataset. The latter has only recently been addressed in the literature. Smith et al. (2021a) outlined an approach for constructing MET datasets that optimises the information available for selection decisions. This is based on new concepts that characterise the structure of a breeding program, defining groups of varieties that enter the initial testing stage of the breeding program in the same year to have come from the same "contemporary group" (CG). MET datasets are formed by combining bands of data to trace the selection histories of varieties within CG to maximise the amount of direct information. For a given dataset the A-optimality criterion from the model-based design literature is used to quantify the information for any given selection decision. The criterion is based on the A-value which is the average pairwise variance of elementary variety contrasts. This is appropriate for assessing variety effects because a small average pairwise variance is synonymous with a low probability of making incorrect selection decisions (Bueno Filho and Gilmour, 2003). A-values are computed using a specified linear mixed model (LMM), requiring the specification of the fixed and random effects and values for the variance parameters. The Smith et al. (2021a) approach assumes known variance parameters, whereas in practise they are required to be estimated. Hence, the purpose of this paper is to develop a diagnostic that measures the likely reliability of genetic variance parameter estimates for a given MET dataset. This could then be used in conjunction with Smith et al. (2021a).
It had previously been thought that variety connectivity was a key driver of the reliability of genetic variance parameter estimation and that this in turn affected the reliability of predictions of VE effects (Smith et al., 2001a(Smith et al., , 2015Ward et al., 2019). To combat these concerns, problematic environments were often removed from MET datasets if they appeared to have insufficient numbers of varieties in common with other environments. However, there has been little work to establish whether variety connectivity is the most appropriate measure to use for this purpose. Lisle (pers. comm) found that although variety connectivity was influential, there appeared to be other factors at play. Additionally, the number of varieties in common between environments may not be relevant for analyzes in which information on genetic relatedness is included since "connectivity" in this case is a more general concept.
In this paper we address the issue in a formal manner by considering the information for estimation of genetic variance parameters. As in Smith et al. (2021a) we use model-based experimental design concepts and assess information using a prespecified LMM. In this setting our interest lies in the reliability of genetic variance parameter estimates across all environments, and for individual environments. We therefore consider the Doptimality criterion which measures the generalised variance of parameter estimates (Butler, 2013;Russell, 2018). This measure is typically used in the context of design searches for estimating (fixed) treatment effects (Atkinson et al., 2007), whereas the application to the problem of variance parameter estimation is not common in the literature. However, Loeza-Serrano and Donev (2014) and Nuga et al. (2017) considered D-optimality to search for experimental designs for estimating variance components. As in Smith et al. (2021a) our application does not involve a design search but rather the quantification of information for a given design (dataset).
In this paper we consider the impact of MET dataset structure on the reliability of residual maximum likelihood (REML) estimates of genetic variance parameters by proposing Doptimality values as a diagnostic measure. The paper is arranged as follows. Section 2 describes the statistical methodology, including a one-stage and two-stage approach for computing the diagnostic. The methods allow for the inclusion of information on genetic relatedness. In section 3, we apply the diagnostic to a durum wheat MET dataset, then examine the performance of the diagnostic using two simulation studies. Some concluding remarks are given in section 4.

Model for Analysis
Let y j denote the n j −vector of data for the j th environment, j = 1, . . . , p. We then let y denote the n−vector of data combined across all environments in the MET, so write y = (y ⊤ 1 , y ⊤ 2 , . . . , y ⊤ p ) ⊤ . Note that n = p j=1 n j . The LMM for y can be written as: where τ is a vector of fixed effects with associated design matrix X; u g is the vector of random genetic effects with associated design matrix Z g ; u p is a vector of random nongenetic (or peripheral) effects with associated design matrix Z p and e = (e ⊤ 1 , e ⊤ 2 , . . . , e ⊤ p ) ⊤ is the combined vector of residuals from all environments. The vector of fixed effects includes mean parameters for individual environments. The vector of random peripheral effects includes effects associated with the experimental designs within environments. It is assumed: where the matrices G g , G p and R are variance matrices for u g u p and e, respectively. G g is known as the between environments variance/covariance matrix and is described in later sections. G p is assumed to be block diagonal given by G p = ⊕ b i=1 σ 2 p i I q i where b is the number of components in u p and q i is the number of effects in (length of) u p i . R is assumed to be block diagonal, so that R = ⊕ p j=1 R j where R j = var e j is the variance matrix for the residuals for the j th environment. In the LMM of Smith et al. (2001b), spatial models are used for the residuals and the matrices R j correspond to separable autoregressive processes.
The random genetic effects u g comprise the variety effects nested within environments, and will be referred to as the VE effects. If we let m denote the total number of unique varieties across all environments, then the vector u g has length mp, which is ordered as varieties within environments. In this paper we allow for the use of pedigree information, so we partition the VE effects into additive and non-additive (residual VE) effects (Oakey et al., 2007) as follows: It is assumed that var (u a ) = G a ⊗ A where A is the numerator relationship matrix and G a is a p × p symmetric positive (semi)-definite matrix that will be referred to as the between environment additive genetic variance matrix. In terms of the non-additive effects, it is assumed that var (u e ) = G e ⊗ I m where G e is a p × p symmetric positive (semi)-definite matrix that will be referred to as the between environment non-additive genetic variance matrix. The variance matrix of the total VE effects (that is, additive plus non-additive) is therefore given by: Note that if no pedigree information is included in the analysis then u g = u e and G g = G e ⊗ I m . Finally, the variance matrix for the data vector is given by: The first step in fitting the model in Equation (1) is the estimation of the variance parameters associated with the random effects and residuals. We let κ denote the vector of (unknown) variance parameters and let n κ be the associated number of parameters. We use residual maximum likelihood (REML) estimation which requires calculation of the REML scores for the elements of κ. These are given by: being any generalised inverse of X ⊤ H −1 X . The "dot" notation indicates a derivative so thatḢ i = ∂H/∂κ i , i = 1 . . . n κ . The REML estimate of κ is obtained by equating the scores to zero and will be denoted byκ. This typically requires an iterative scheme. A computationally efficient scheme is the average information algorithm of Gilmour et al. (1995) which is a Fisher scoring algorithm in which the average information matrix, I A , is used instead of the expected information matrix, I E . The elements of these matrices are given by: Given the REML estimates of the variance parameters we can then compute empirical best linear unbiased estimates (EBLUEs) of the fixed effects and empirical best linear unbiased predictions (EBLUPs) of the random effects in Equation (1). In particular, the EBLUPs of the VE effects are given byũ g = G g Z g ⊤ Py and these have an associated prediction error variance of var ũ g − u g = G g − G g Z g ⊤ PZ g G g . Note that in these equations the matrices G g and P are formed using the REML estimateκ of κ. We can then compute a model based reliability (Mrode and Thompson, 2005) for an individual VE effect prediction as the square of the correlation between the true effect and the EBLUP. For the k th VE effect, this is obtained as: where the subscript "kk" indicates the k th diagonal element of the associated matrix.

Information Based Diagnostic for Genetic Variance Parameter Estimation
An asymptotic variance matrix for the REML estimates of the variance parameters can be obtained as the inverse of the information matrix. This could either be the average information matrix or, more traditionally, the expected information matrix. For the purposes of developing a diagnostic, we use the latter as it does not depend on the data, the elements of which are given in Equation (6). In this paper the interest lies in the estimation of genetic variance parameters, so we partition the variance parameters as κ = κ g ⊤ , κḡ ⊤ ⊤ , where κ g are the genetic variance parameters associated with G a and G e and κḡ are the remaining variance parameters, that is, associated with the peripheral random effects and residuals. We partition the full expected information matrix accordingly and write as: The asymptotic variance matrix forκ g can then be obtained as: Smith and Cullis (2018) recommend the use of factor analytic models for G a and G e . Other possibilities include compound symmetric and unstructured forms. Irrespective of the form used, the parameters of interest are the variances and covariances in G a and G e . The aim in this paper is to develop a diagnostic that reflects the information available to estimate these parameters but which does not require the fitting of the full LMM. In order to achieve this we apply some of the concepts from modelbased design in which the aim is to search a design space for a configuration which is optimal in some sense under a prespecified LMM. The latter includes specification of the terms in the model and also values for the variance parameters. Although the aim here is not to search a design space but rather to assess a particular design (dataset) we can proceed in a similar manner by considering a pre-specified LMM. In order to simplify computations but enable wide applicability we use a LMM that has a relatively simple structure for the non-genetic effects. In terms of the model in Equation (1) we assume that the fixed effects comprise a mean parameter for each environment (so that τ = τ 1 , τ 2 , . . . , τ p ⊤ ) and we assume there are no peripheral effects so write: where the design matrices are given by X = ⊕ p j=1 1 n j and Z g = ⊕ p j=1 Z g j where Z g j is the n j × m design matrix for the VE effects for environment j (= 1, . . . , p). The genetic variance matrices, G a and G e , are assumed to have unstructured forms with p(p + 1)/2 unique variance parameters in each that are denoted by σ a js and σ e js (j ≤ s = 1, . . . , p), respectively.
Finally, we assume that the residual variance matrices are given by R j = σ 2 j I n j so that R = ⊕ p j=1 σ 2 j I n j . The variance matrix for the data vector is then given by: and the unknown variance parameters are κ = κ g ⊤ , κḡ ⊤ ⊤ where κ g comprises σ a js and σ e js (j ≤ s = 1, . . . , p) and κḡ comprises σ 2 j (j = 1, . . . , p). We then use pre-specified values of these parameters to compute the information matrix in Equation (8) and thence the variance matrix in Equation (9). The chosen variance parameters will be denoted κ g 0 and κḡ 0 and the resultant variance matrix denoted by (κ g 0 , κ g ⊤ 0 ). We then consider the D-optimality criterion of model-based design because it is used to search for designs that minimise the generalised variance of parameter estimates. In our setting we wish to measure the generalised variance of the genetic variance parameter estimates for a given dataset. This can be obtained for the complete set of genetic variance parameters as: where the vertical bar represents the determinant and n κ g is the number of genetic variance parameters which is used as a divisor to provide a scaling for comparisons across models and/or datasets.
Although the overall D-value is of interest, our focus is on individual environments and their relative contribution to the reliability of genetic variance parameter estimation. We therefore also compute a D-value for environment j (= 1, . . . , p) as: where (κ g 0 j , κ g ⊤ 0 j ) is the partition of (κ g 0 , κ g ⊤ 0 ) that relates to environment j and n κ g j is the associated number of genetic variance parameters. In the case of models in which information on genetic relatedness is not used we have n κ g j = p, and the parameters are the genetic variance for the environment and all p − 1 genetic covariances with other environments. In models in which the genetic effects are partitioned into additive and non-additive effects we have n κ g j = 2p. To distinguish between these different genetic models we label the diagnostic values as D j (A+I) if they correspond to a LMM with both additive (A) and non-additive (or independent, I) VE effects; D j (I) if they correspond to the LMM with independent VE effects alone (that is, genetic relatedness is not used) or D j (A) if they correspond to the LMM with additive VE effects alone. Irrespective of the genetic model used, the diagnostic values for all p environments can then be scrutinised in various ways in order to check for "problem" environments with large values, which represent those environments with large variance.
Finally, a further computational simplification can be made in the calculation of (κ g , κ g ⊤ ) by using the marginal variance matrix for the genetic variance parameters rather than the conditional matrix as given in Equation (9). Thus, we can use: This is a reasonable simplification given that the non-genetic variance parameters in the pre-specified LMM are simply the residual variances so that the uncertainty associated with their estimation is likely to be small.

A Two-Stage Procedure
We first let d j be the number of varieties in environment j (= 1, . . . , p) and define d = p j=1 d j to be the number of VE combinations present in the data. Then note that formation of (κ g 0 , κ g ⊤ 0 ) using Equations (9) or (13) involves calculating traces of matrices of dimension n. The dimensionality of the problem can be reduced by considering a two-stage approximation to the LMM as described in Gogel et al. (2018). Given the simple form for the model in Equation (10) and the associated variance matrices, we may expect little loss in using this approach and the benefit is a reduction in dimensionality from n (total number of plots in the dataset) to d (total number of VE combinations present).
In the first stage of the two-stage approach, a separate analysis is conducted for each environment in order to obtain predicted variety means and a measure of their uncertainty. In these analyzes the variety effects are regarded as fixed effects. The predicted means are combined across environments to form the data for the second stage analysis. We adopt the notation of Gogel et al. (2018) so let η denote the full mp × 1 vector of variety mean parameters for individual environments and let η d be the d × 1 sub-vector corresponding to the VE combinations present in the data. Thus, we can write η d = Dη where D is a d × mp indicator matrix that selects the appropriate elements. We letη d be the vector of predicted variety means for individual environments from the first stage. In our setting the individual environment analyzes are particularly simple, involving only a single set of effects, namely the fixed variety effects, and the residual variance for environment j is simply σ 2 j I n j . This means that the variance matrix ofη d from the first stage is given by = ⊕ p j=1 j where Frontiers in Plant Science | www.frontiersin.org Number of: trials for each stage of testing (S1, S2, and S3), total trials, plots and varieties.
j is the d j × d j diagonal matrix given by σ 2 j diag 1/r ji , where r ji is the number of plots of variety i in environment j.
The LMM for the second stage combined analysis of the p environments can then be written as: where y 2 =η d from the first stage, and X 2 = D I p ⊗ 1 m . In terms of the variance structures, var (u a ) = G a ⊗ A and var (u e ) = G e ⊗ I m (as in the one-stage analysis) and var (ξ ) = where this is known from the first stage. The variance matrix for the data vector in the second stage is then given by: Elements of the expected information matrix for the variance parameters in the second stage LMM are then given by: This now involves matrices of dimension d rather than n. As in the previous section 2.2 we do not actually conduct the two-stage analysis but compute the expected information matrix using Equation (16) for a given choice of variance parameter values. We then form the (marginal) variance matrix for the genetic variance parameter estimates using Equation (13) and denote the resultant matrix by . This is substituted into Equations (11) and (12) to compute the diagnostic.
A β-version of the R script file to calculate D-values is available upon request to the corresponding author.

Application of Diagnostic to a MET Dataset
The motivating MET dataset of this paper comes from the Durum Breeding Australia North program. Here we present some key summary information and apply the diagnostic procedure. Summary information for the durum wheat dataset is given in Table 1. The dataset comprised m = 3, 708 varieties from n = 9, 786 plots corresponding to 40 trials from breeding stages Stage 1 (S1) to Stage 3 (S3) across p = 13 environments sown between 2014 and 2018. The number of varieties per environment ranged from 96 to 1,649 with a median of 105. We note that there are d = 6,168 variety by environment combinations present in the data, representing a nearly 40% reduction when using a 2-stage approach for computing the diagnostic. The pedigree information comprised 3,959 records and included all the varieties in the MET dataset. The inbreeding coefficients of the latter ranged from 0.750 to 0.998 with a mean of 0.911. The number of varieties in common between pairs of environments (displayed in a heatmap in Figure 1) ranged from 3 to 485 with a median of 36.
Given that the analysis of the durum data for the purposes of variety selection would proceed using a LMM with the partitioning of the VE effects into additive and non-additive effects, we computed the I 2E based on this model (so that n κ g = 182). The values of the variance parameters for calculation of the diagnostic were set to: σ a js = 0.10; j = s = 1, . . . , p = 0.08; j < s = 1, . . . , p Frontiers in Plant Science | www.frontiersin.org These values were chosen as being both representative of actual estimates from historical analyzes that are often encountered in practise and of a magnitude that could allow the diagnostic to provide good discrimination between environments. In particular, we have set the additive genetic variance to 80% of the total genetic variance (see Equation 3) for each environment and therefore 20% for the non-additive genetic variance), and a between environments correlation of 0.8 for both additive and non-additive VE effects. The D j (A+I)-values for each environment from this prespecified LMM are given in Table 2. The environments 2016-Tworth and 2015-Tworth had the smallest D j (A+I)-values and therefore the greatest information to estimate genetic variance parameters. Whereas, 2018-Tworth, 2018-Gurley, and 2018-Breeza had the largest D j (A+I)-values and therefore the least information to estimate genetic variance parameters.
Given the high percentage of additive genetic variance we also computed the simpler diagnostic, based on a LMM with additive VE effects alone. For this diagnostic n κ g = 91 representing a four-fold reduction in the number of elements in I 2E (κ i , κ j ) and hence a significant savings in computation. The resultant D j (A)values are presented in Figure 2A and Table 2, which show little difference compared with the D j (A+I)-values. In particular, Figure 2A shows an almost 1:1 relationship.
To investigate the robustness of the diagnostic we also inspect the D j (A+I)-and D j (A)-values with parameters set to those which are at the lower end of those seen in practise. The values of the variance parameters for calculation of the diagnostic were set to: In particular, we have set the additive genetic variance to 40% of the total genetic variance (see Equation 3) for each environment and therefore 60% for the non-additive genetic variance, and a between environments correlation of 0.4 for both additive and non-additive VE effects. The resultant D j -values are presented in both Figure 2 and Table 2. Once again there is little difference in the rankings of environments for D j (A+I) compared with D j (A) (Figure 2D). Additionally the rankings were robust to the two choices of variance parameters (high and low) used in forming the diagnostic, with the only noticeable change being associated with 2014-Breeza (Figures 2B,C). As a comparison with the historical measure of variety connectivity the D j (A+I) values computed using the high set of parameters have been plotted against mean connectivity in Figure 3. Two other structural characteristics of the environments are indicated on this plot, namely the number of varieties grown and the mean replication per variety. The figure shows that the diagnostic values encompass numerous structural elements of the environments.

LMM Without Pedigree Information
Within the framework of the LMM with independent VE effects it was previously thought that variety connectivity was a key driver of the reliability of variance parameter estimation and that this in turn affected the reliability of predictions of VE effects. We therefore consider a simulation study in which a range of connectivity levels is examined and assess the performance of both variety connectivity and the D-optimality diagnostic. For simplicity, and without loss of generality, we use p = 2 environments and label these as Env1 and Env2. Each environment has the same number of varieties (so that d 1 = d 2 ), and we vary the number of varieties in common (which is given by c = d − m). We assume the trials in Env1 and Env2 comprise 3 replicates and consider 4 sizes (Tsize) of trial corresponding to different numbers of varieties, namely d 1 (= d 2 ) = 12, 24, 48 and 96 so that n 1 (= n 2 ) = 36, 72, 144 and 288.
The simulation study for the first trial size (Tsize=12) is described in the following. We consider the connectivity levels c = 2, 4 . . . 12 (increments of 2). The maximum total number of varieties across Env1 and Env2 is m = 22, corresponding to c = 2. We label these varieties as V1 -V22. We assume that the 12 varieties in Env1 are always V1 -V12. The 12 varieties in Env2 are then V1 -V12 for c = 12; V2 -V13 for c = 11 and so on to V11 -V22 for c = 2. Our focus is on Env1 because this contains the same varieties across all connectivity levels so allows a fair comparison across these levels. The underlying LMM is as in Equation (10) but with independent VE effects alone (that is, without the additive VE effects) so that n κ g = 3. Given the data structure for each value of c and some pre-specified variance parameters, we can compute the diagnostic for Env1 which will be denoted D 1c (I). For the purposes of both the calculation of D 1c (I) and of data generation in the simulation study we chose the values of the variance parameters to be κ g 0 = σ e 11 0 = 0.2, σ e 12 0 = 0.16, σ e 22 0 = 0.2 ⊤ and κḡ 0 = σ 2 1 0 = 0.15, σ 2 2 0 = 0.15 ⊤ . The diagnostic is calculated using the two-stage formula for expected information, that is, as in Equation (16) and is given by: log c (σ e 11 0 , σ e 12 0 ), (σ e 11 0 , σ e 12 0 ) ⊤

(17)
In the simulation study, the steps for the t th simulation (t = 1 . . . N) are as follows: 1. Generate the random genetic effects u e and residuals e as per the LMM in Equation (10) and for the pre-specified variance parameters κ g 0 and κḡ 0 . In terms of the fixed effects, without loss of generality we choose τ 1 = τ 2 = 0. Note that we generate 2m genetic effects, where m = 22 which corresponds to the maximum total number of varieties across all connectivity levels. We denote the resultant vector for simulation t by u e t . The residuals for simulation t are denoted by e t which is a vector of length n 1 + n 2 = 72. 2. For the connectivity level c, we subset the appropriate 24 elements of u e t . We will label the associated vector as u e tc . We then form the data vector and fit the LMM as in Equation (10), without the inclusion of pedigree information. We save the REML estimates of the genetic variance parameters, denoting these asσ e 11 tc ,σ e 12 tc ,σ e 22 tc and save the EBLUPs of the genetic effects, denoting these asũ e tc . 3. Repeat step 2. for each value of c A total of N = 2, 000 simulations was conducted for each trial size by variety connectivity combination. The simulation based diagnostics and reliabilities were only computed for the LMMs in Step 2. that achieved convergence (with one update if required) and resulted in a positive definite form for the REML estimate of G e . All models in this paper were fitted using the ASReml-R package (Butler et al., 2017) within R (R Core Team, 2020). The simulations were conducted in order to obtain two main quantities of interest for each value of c, namely a measure of the reliability of the genetic variance parameter estimates and a measure of the reliability of the predicted variety effects for Env1. For the former, we computed a simulation based equivalent of the diagnostic in Equation (17), namely: where the determinant is with respect to the sample variance/covariance matrix of the REML estimates of the genetic variance parameters for Env1. In terms of the variety predictions, we computed the reliability of the EBLUPs for the 12 varieties that were always present in Env1, namely V1 -V12. For each value of c, the reliability for variety k(= 1 . . . 12) in Env1 was computed as the square of the sample correlation between the true (generated) effects (element of u e tc for the variety and Env1) and the EBLUPs (element ofũ e tc for the variety and Env1). This will be denoted R S kc . Noting that the simulation based reliabilities (R S kc ) of the variety predictions take into account the uncertainty in having to estimate the variance parameters, we compute analogues values that assume known variance parameters. These reliabilities therefore reflect the maximum possible values against which we can measure the loss attributable to variance parameter estimation. This was achieved by fitting, for each value of c, the LMM as per Equation (10) but with the variance parameters fixed at the value κ 0 . We then computed the model based reliability for variety k(= 1 . . . 12) in Env1 using Equation (7) but because this has been computed with respect to known variance parameters (not REML estimates) we will call it the design based reliability (R D kc ). We calculated the associated loss for the EBLUP reliabilities as: Finally we summarise these by taking means across the varieties in Env1 that were also present in Env2. We restrict the results to this set of varieties because in any MET analysis, there is a fundamental difference between varieties that were present in multiple environments (so-called connected varieties) and those that were present in a single environment only. The MET analysis, compared with separate analyzes of individual environments, has the potential to improve the reliability of predictions for connected varieties through the use of additional data. This is not the case for varieties present in a single environment only. Hence our focus is the connected varieties.

Results of Simulation Without Pedigree Information
First we note that the number of simulations in which the model fitting was successful (as defined in section 3.2.1) was strongly related to the number of varieties in common between the two environments (see Figure 4). The number of successful model fits for the connectivity level of c = 2 was particularly low and additionally the results were found to be unreliable.
Therefore, in what follows, the results for this connectivity level have been excluded. The relationship between the diagnostic D 1c (I)-values (from Equation 17) and the simulation based equivalent D S 1c (I)-values (from Equation 18) is shown in Figure 5A. This good agreement shown in Figure 5A clearly indicates that the diagnostic performs well in terms of forecasting the level of uncertainty in genetic variance parameter estimation. Figure 5B shows that there is a decreasing linear relationship between (log) variety connectivity and the uncertainty in genetic variance parameter estimation, but this is only within a given trial size, that is, for a given number of varieties. The connectivity measure fails for comparisons involving trials with different number of varieties. Figure 6 shows the mean losses in reliability of the EBLUPs of VE effects for Env1 for those varieties that were present in both environments. These are plotted against the diagnostic D 1c (I)values, with a separate panel for each trial size. Each point has been supplemented with a standard error of the mean (SEM) which was based on a pooled estimate of error across all trial sizes and connectivity levels. Thus differences in SEM reflect differences in the numbers of varieties used to compute the means (that is, differences in connectivity). The panels in this figure show that, for a given trial size, the loss in reliability of EBLUPs is well predicted by the diagnostic D 1c (I)-values. This also holds across trial sizes, although the relationship is more variable (Figure 7). FIGURE 4 | Independent VE effects simulation study: Number of successful model fits from N = 2, 000 simulations plotted against number of varieties in common for four trial sizes (trials with 12, 24, 48, and 96 varieties). Trial sizes (Tsize) are represented using different colours. Each point within Tsize corresponds to a different level of variety connectivity which ranges from c = 2 up to the number of varieties in a trial (representing 100% connectivity between the two trials). FIGURE 6 | Independent VE effects simulation study: Mean loss in reliability of the EBLUPs of VE effects for Env1 for those varieties that were present in both environments. Each panel corresponds to a different trial size (trials with 12, 24, 48, and 96 varieties) and the points correspond to a sequence of connectivity levels. Also shown are standard errors for each mean (vertical lines) and a loess smoother through the means for each Tsize. Figures 4, 5, 7 have been extracted for the "best case" scenario of 100% connectivity for each Tsize and are presented in Table 3. This again shows the good agreement between the diagnostic D 1c (I)-values and the simulation based D S 1c (I)-values and the relationship between the diagnostic and the loss in reliability of VE predictions. It also shows that, even with 100% connectivity, there were substantial problems with the smallest trial size in terms of all criteria (number of successful model fits, reliability of genetic variance parameter estimates and reliability of VE effect predictions).

LMM With Pedigree Information
We then extended the simulation study in order to assess the performance of the diagnostic in terms of correlated VE effects. To simplify the simulations, and without loss of generality, we considered the LMM as in Equation (1) but without the nonadditive VE effects, so that as in section 3.2.1, n κ g = 3. The set-up for the study is the same as in section 3.2.1 but we only consider the two larger trial sizes, namely d 1 (= d 2 ) = 48 and 96 so that n 1 (= n 2 ) = 144 and 288. Across the range of connectivity levels the total number of varieties required for the simulation was 191 (corresponding to c = 2 for the trial size of 96) and we label these as V1-V191. The simulation study requires a numerator relationship matrix (A) for these varieties. We therefore chose V1-V191 from the actual lines in Stage 3 (S3) and Stage 4 (S4) in 2018 and 2017 in the durum data and computed A from the associated pedigree information. For the chosen subset of varieties, the inbreeding coefficient ranged from 0.938 to 0.969 with a mean of 0.957.

Results of Simulation With Pedigree Information
As in the independent VE effects study, the number of simulations in which the model fitting was successful was FIGURE 7 | Independent VE effects simulation study: Mean loss in reliability of the EBLUPs of VE effects for Env1 for those varieties that were present in both environments. The colours correspond to different trial sizes (trials with 12, 24, 48, and 96 varieties) and the points for each colour correspond to a sequence of connectivity levels. Also shown are standard errors for each mean (vertical lines) and a loess smoother through all the means. related to the number of varieties in common between the two environments (see Figure 8). However, a key difference was that the number of successful model fits for the connectivity level of c = 2 was reasonable so these results have been included in what follows.
The results are presented in the same format as in section 3.2.1. The good agreement shown in Figure 9A clearly indicates that the diagnostic performs well in terms of forecasting the level of uncertainty in genetic variance parameter estimation in the presence of pedigree information. Figure 9B shows that there is a decreasing linear relationship between (log) variety connectivity and the uncertainty in genetic variance parameter estimation, but this only holds for trials with the same number of varieties.
The mean loss in reliability of the EBLUPs of the additive VE effects for Env1 for those varieties that were present in both environments is well predicted by the diagnostic D 1c (A)-values, both for individual trial sizes (Figure 10) and across trial sizes (Figure 11).
We again investigate the robustness to a change in variance parameters. The same low bound values used in the durum example are used and applied to the same set of simulation studies. These results are provided in Supplementary Materials and highlight similar trends to the results above.  FIGURE 10 | Additive VE effects simulation study: Mean loss in reliability of the EBLUPs of VE effects for Env1 for those varieties that were present in both environments. Each panel corresponds to a different trial size (trials with 48 and 96 varieties) and the points correspond to a sequence of connectivity levels. Also shown are standard errors for each mean (vertical lines) and a loess smoother through the means for each Tsize.

DISCUSSION
Despite the fact that Multi-environment trials (METs) are a key aspect of plant breeding, there is little in the literature that addresses the "design" of METs in the sense of determining which trials should be combined. Within the paradigm of the factor analytic linear mixed model (FALMM) approach for MET analysis, the aim of both the design and analysis is the reliable prediction of the variety effects for individual environments (VE effects). These are the baseline predictions which can then be summarised across environments in meaningful ways, for example, using the interaction class approach of Smith et al. (2021b). Smith et al. (2021a) discuss the structure of MET datasets and use a model-based design approach to quantify the amount of information for variety comparisons in a given dataset. They apply the A-optimality measure to variety effects to show the importance of including as many trials as are necessary to capture the selection histories of the varieties under consideration for selection. As in the case of standard model-based design, the calculation of A-values is based on a pre-specified linear mixed model (LMM) so assumes that the associated variance parameters are known. It is important, therefore, to also consider the impact of the structure of a MET dataset on the reliability of genetic variance parameter estimation, as this in turn may affect the reliability of variety predictions, so should be used in conjunction with A-optimality.
In this paper we have developed a diagnostic to be applied to a MET dataset prior to analysis in order to assess the likely reliability of genetic variance parameter estimates, both for the dataset across all environments and for individual environments. As in Smith et al. (2021a) we use a model-based design approach and apply D-optimality measures to genetic variance parameters. Two simulation studies, one using a LMM with independent VE effects and the other additive VE effects, showed that these diagnostic D-values performed well in the sense of predicting the actual reliability of genetic variance parameter estimates.
Historically, variety connectivity between environments was the measure calculated prior to the conduct of a MET analysis to investigate the likely reliability of genetic variance parameter estimation. This measure is simple to compute and intuitively reasonable but there has been nothing in the literature to validate its use. In our simulation studies it was shown that variety connectivity was only able to predict the reliability of genetic variance parameter estimation across connectivity levels for a given trial size (number of varieties in the trial), whereas the new D-optimality diagnostic predicted reliability across both connectivity levels and trial sizes. The application to the real example also suggested that D-optimality encapsulates numerous structural features of a MET data-set that are influential in determining the reliability of genetic variance parameter estimation. These features included, but are not limited to, variety connectivity, trial size and variety replication. Computation of the diagnostic requires specification of variance parameters, namely genetic and residual variances for individual environments and genetic correlations between pairs of environments. Clearly we do not know these values or how they differ between environments prior to the MET analysis. In terms of the latter a pragmatic and sensible approach is FIGURE 11 | Additive VE effects simulation study: mean loss in reliability of the EBLUPs of VE effects for Env1 for those varieties that were present in both environments. The colours correspond to different trial sizes (trials with 48 and 96 varieties) and the points for each colour correspond to a sequence of connectivity levels. Also shown are standard errors for each mean (vertical lines) and a loess smoother through all the means.
to assume homogeneity between environments. The parameter values can be chosen to reflect typical estimates obtained in practise. Both the simulation studies and the real example showed that the diagnostic was robust to the specification of these values.
The pre-specified LMM for the D-optimality diagnostic allowed for the estimation of separate genetic variances for all environments and separate genetic covariances for all pairs of environments. It is therefore targeting MET analyzes that employ factor analytic, or possibly unstructured forms for the genetic variance matrices. It is easy to modify the pre-specified LMM to reflect simpler models, for example, variance component models. Additionally, the partitioning of the genetic effects into additive and non-additive was achieved in this paper using pedigree information but the modification to use genomic (marker) data is straightforward.
The simulation study results suggested that trials with small numbers of varieties will, in general, have larger D-values when compared with trials with more varieties. Even in the case of 100% connectivity, the smallest trial size considered (12 varieties in each of the two trials) resulted in large D-values which then translated to substantial losses in the reliability of VE effect predictions. Additionally, the number of successful model fits was much lower compared with scenario in which there were more varieties in each trial. This is consistent with our experience in analysing MET datasets in which many trials have small numbers of varieties. Even when the connectivity between these and larger trials is high, there are often computational difficulties in fitting the FALMM. In practise, we therefore suggest that individual environment diagnostic values should be examined for a given MET dataset in order to identify environments with large D-values. These environments may contribute insufficient information for genetic variance parameter estimation so their inclusion in the MET dataset should be carefully considered. Additionally, examination of the overall diagnostic value across all environments may be useful. If the overall D-value is large this may indicate insufficient information to fit the gold standard FALMM and it may only be possible to fit a simpler model, such as a variance component model.

DATA AVAILABILITY STATEMENT
The data analysed in this study is subject to the following licenses/restrictions the datasets presented in this article are not readily available because they are owned by New South Wales Department of Primary Industries and the Grains Research and Development Corporation. Requests to access these datasets should be directed to Gururaj Kadkol, gururaj.kadkol@dpi.nsw. gov.au.

AUTHOR CONTRIBUTIONS
CL, BC, and AS conceived the ideas. CL developed and applied the methodology, prepared the data, conducted the simulation study, and prepared the first draft of the manuscript. All authors contributed to the general discussions, manuscript revision, read, and approved the submitted version.

FUNDING
CL and AS were supported by the Grains Research and Development Corporation (GRDC) through the Statistics for the Australian Grains Industry (SAGI) project (UW00009).